Problem Statement:

The given dataset was collected from the hospital for nearly 2 months of period. We will be using this dataset to predict Chronic Kidney Disease.

Data Set Information:

age - age

bp - blood pressure

sg - specific gravity

al - albumin

su - sugar

rbc - red blood cells

pc - pus cell

pcc - pus cell clumps

ba - bacteria

bgr - blood glucose random

bu - blood urea

sc - serum creatinine

sod - sodium

pot - potassium

hemo - hemoglobin

pcv - packed cell volume

wc - white blood cell count

rc - red blood cell count

htn - hypertension

dm - diabetes mellitus

cad - coronary artery disease

appet - appetite

pe - pedal edema

ane - anemia

class - class ( Here ckd means patient have Chronic kidney disease and not ckd indicated absence of the same)

As we can see that 'packed_cell_volume', 'white_blood_cell_count' and 'red_blood_cell_count' are object type. We need to change them to numerical dtype.

There is some ambugity present in the columns we have to remove that.

Skewness is present in some of the columns.

EDA - Exploratory Data Analysis:

Let's drao Violio and KDE Graphs to present the available values...

Scatter Plots with the combinations.

Bar Graphs on the composition...

Pre-Processing of Data

All the missing values are handeled now, lets do ctaegorical features encding now

As all of the categorical columns have 2 categories we can use label encoder

Model Building Activity...

K-Nearest Neighbors Algorithm

Decision Tree Classifier

Random Forest Classifier

Ada Boost Classifier

Gradient Boosting Classifier

SGB - Stochastic Gradient Boosting

XgBoost

Cat Boost Classifier

Extra Trees Classifier

LGBM Classifier

Comparison of built Models